Line plot ===== **Line plot** displays n-grams as a series of data points connected by straight line segments. The graph display unigrams (single words) and bigrams over a monthly or yearly period. It works best with homogenous datasets with relatively few periods (max T=10). ------------------------------------- **Coding example:** **Use case:** Essential topics in newspaper headlines **Data**: Million News Headlines dataset, source: `Australian Broadcasting Corporation `_, data licence: `CC0 1.0: Public Domain `_. Coding: .. code-block:: python :linenos: import pandas as pd from arabica import cappuccino .. code-block:: python :linenos: data = pd.read_csv('abcnews_data.csv', encoding='utf8') The data looks liks this: .. csv-table:: :header: "headline", "date" :widths: 90, 10 :align: left "aba decides against community broadcasting licence", 2003-2-19 "act fire witnesses must be aware of defamation", 2003-2-19 It procceeds in this way: * **additional stop words** cleaning, if ``skip is not None`` * **lowercasing**: reviews are made lowercase so that capital letters don't affect n-gram calculations (e.g., "Tree" is not treated differently from "tree"), if ``lower_case = True`` * **punctuation** cleaning - performs automatically * **stop words** removal, if ``stopwords is not None`` * **digits** removal, , if ``numbers = True`` * n-gram frequencies for each headline are calculated, aggregated by monthly frequency, and displayed in a line plot. .. code-block:: python :linenos: cappuccino(text = data['headline'], time = data['date'], date_format = 'us', # Uses US-style date format to parse dates plot = 'line', ngram = 1, # N-gram size, 1 = unigram, 2 = bigram time_freq = 'M', # Aggregation period, 'M' = monthly, 'Y' = yearly max_words = 6, # Displays 6 most frequent unigrams (words) for each period stopwords = ['english'], # Remove English stopwords skip = ['covid','donald trump'], # Remove additional stop words numbers = True, # Remove numbers lower_case = True) # Lowercase text Here is the output: .. image:: line_4.png :height: 400 px :width: 900 px :alt: alternate text :align: left ----- Download the jupyter notebook with the code and the data `here `_.